Construction of a Public CHO Cell Line Transcript Database Using Versatile Bioinformatics Analysis Pipelines

نویسندگان

  • Oliver Rupp
  • Jennifer Becker
  • Karina Brinkrolf
  • Christina Timmermann
  • Nicole Borth
  • Alfred Pühler
  • Thomas Noll
  • Alexander Goesmann
چکیده

Chinese hamster ovary (CHO) cell lines represent the most commonly used mammalian expression system for the production of therapeutic proteins. In this context, detailed knowledge of the CHO cell transcriptome might help to improve biotechnological processes conducted by specific cell lines. Nevertheless, very few assembled cDNA sequences of CHO cells were publicly released until recently, which puts a severe limitation on biotechnological research. Two extended annotation systems and web-based tools, one for browsing eukaryotic genomes (GenDBE) and one for viewing eukaryotic transcriptomes (SAMS), were established as the first step towards a publicly usable CHO cell genome/transcriptome analysis platform. This is complemented by the development of a new strategy to assemble the ca. 100 million reads, sequenced from a broad range of diverse transcripts, to a high quality CHO cell transcript set. The cDNA libraries were constructed from different CHO cell lines grown under various culture conditions and sequenced using Roche/454 and Illumina sequencing technologies in addition to sequencing reads from a previous study. Two pipelines to extend and improve the CHO cell line transcripts were established. First, de novo assemblies were carried out with the Trinity and Oases assemblers, using varying k-mer sizes. The resulting contigs were screened for potential CDS using ESTScan. Redundant contigs were filtered out using cd-hit-est. The remaining CDS contigs were re-assembled with CAP3. Second, a reference-based assembly with the TopHat/Cufflinks pipeline was performed, using the recently published draft genome sequence of CHO-K1 as reference. Additionally, the de novo contigs were mapped to the reference genome using GMAP and merged with the Cufflinks assembly using the cuffmerge software. With this approach 28,874 transcripts located on 16,492 gene loci could be assembled. Combining the results of both approaches, 65,561 transcripts were identified for CHO cell lines, which could be clustered by sequence identity into 17,598 gene clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TOPP - the OpenMS proteomics pipeline

MOTIVATION Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In...

متن کامل

Construction of Mtb72F Plasmid as a DNA Vaccine Candidate for Mycobacterium tuberculosis

Background:  With one-third of the world’s population infected, tuberculosis (TB) is one of the most common infectious diseases and a major public health problem, especially in developing countries. The efficacy of the BCG vaccine for controlling the disease in adults is poor. The development of an effective TB vaccine is a global objective. An effective tuberculosis vaccine should s...

متن کامل

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...

متن کامل

Establishment a CHO Cell Line Expressing Human CD52 Molecule

Background: CD52 is a small glycoprotein with a GPI anchor at its C-terminus. CD52 is expressed by Normal and malignant T and B lymphocytes and monocytes. There are detectable amounts of soluble CD52 in plasma of patients with CLL and could be used as a tumor marker. Although the biological function of CD52 is unknown but it seems that CD52 may be involved in migration and activation of T-cells...

متن کامل

Bioinformatics Study of the Effect of Brain-derived Neurogenic Factor (BDNF) on Gene Expression in SH-SY5Y Cell Line

Introduction: Considering the importance of the evaluation and identification of BDNF protective pathways, this study was conducted to analyze the expression rate of genes registered in the NCBI database to identify the genes expressed in SH-SY5Y cell line due to BDNF protection and oxidative stress and also to identify the protective pathways of BDNF. Method: In this study, bioinformatics and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014